Compared to previous IoT principles, NB-IoT provides end customers with a higher quality of service (QoS) in an era where everything is connected to the internet. The Third-Generation Partnership Project (3GPP) for Low-Power Wide-Area Networks (LPWAN) introduced the narrowband Internet of Things (NB-IoT), a new cellular radio access technology based on Long-Term Evolution (LTE). The main objectives of NB-IoT are to enable low-power, low-cost, and low-data-rate communication and to support massive machine-type communication (mMTC). One of the more difficult tasks in uplink transmission is resource allocation. While numerous suggestions have been made for efficient resource distribution, a comprehensive and successful solution has not yet been provided. In this article, we attempt to suggest a resource allocation technique based on reinforcement learning (RL). Reinforcement learning is a subset of machine learning that essentially employs an agent to act in the environment and gather rewards, both positive and negative. We will be using the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm as it\'s an improvement to its predecessor algorithm (DDPG). We will be tweaking the parameters in RL for better efficiency: Latency, Throughput, energy efficiency, fairness, and rewards.
Introduction
Narrowband Internet of Things (NB-IoT) is a low-power, cost-effective cellular technology developed by 3GPP to connect IoT devices with expanded coverage and efficient spectrum use. It coexists with LTE and 5G, operating in three modes: standalone, in LTE carriers, or in guard bands. NB-IoT uses a narrow 180 kHz bandwidth and relatively low data rates suitable for its applications. Resource allocation in NB-IoT involves managing uplink/downlink subcarriers and optimizing parameters like latency, throughput, and energy efficiency.
Due to challenges in efficient resource use and massive device connections, reinforcement learning (RL) is proposed for improving resource allocation. RL, particularly Q-learning and its advanced variant TD3, enables adaptive decision-making by agents interacting with the environment to maximize rewards. TD3 improves stability and performance by addressing Q-function overestimation and incorporating policy smoothing.
The study focuses on managing uplink NPUSCH transmissions—scheduling devices, adapting links, and allocating resources—using RL algorithms like DQN, PPO, and TD3, comparing them across throughput, latency, fairness, energy efficiency, and rewards. The simulation models NB-IoT device behavior during connection, random access, data transmission, and retransmission, emphasizing the need for online learning algorithms that minimize transmission delay and adapt dynamically to network conditions.
The work highlights the limitations of existing RL methods in sample efficiency and learning speed, advocating for advanced RL techniques capable of online learning in real NB-IoT networks.
Conclusion
In conclusion, this research article explores the application of reinforcement learning, specifically the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm, for resource allocation in Narrowband Internet of Things (NB-IoT) systems. Through a comprehensive analysis and comparison with existing algorithms such as DQN and PPO, TD3 demonstrates superior performance across various metrics including total reward, energy efficiency, throughput, fairness, and latency. The results highlight TD3\'s effectiveness in optimizing resource allocation, managing data transmission, and minimizing delays in NB-IoT environments. Additionally, the visualization of resource allocation patterns across multiple channels provides valuable insights into the algorithm\'s behavior and its ability to adapt to different scenarios. Overall, the findings suggest that TD3 offers a promising solution for enhancing the efficiency and effectiveness of resource allocation in NB-IoT systems, paving the way for improved performance and reliability in IoT applications. Further research and experimentation could delve deeper into fine-tuning parameters and exploring additional optimization strategies to further enhance the capabilities of TD3 in real-world deployment scenarios.
References
[1] Wang Y-P Eric, Lin Xingqin, Adhikary Ansuman, et al. A primer on 3GPP narrowband Internet of Things. IEEE Communications Magazine. 2017; 55(3): 117-123.
[2] Ratasuk Rapeepat, Vejlgaard Benny, Mangalvedhe Nitin, et al. NB-IoT system for M2M communication. 2016 IEEE Wireless Communications and Networking Conference. IEEE; 2016: 1-5.
[3] Yu Ya-Ju, Wang Jhih-Kai. Uplink resource allocation for narrowband Internet of Things (NB-IoT) cellular networks. 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE; 2018: 466-471.
[4] Liu Jiahui, Mu Qin, Liu Liu, et al. Investigation about the paging resource allocation in NB-IoT. 2017 20th International Symposium on Wireless Personal Multimedia Communications (WPMC). IEEE; 2017: 320-324.
[5] Muteba KF, Djouani K, Olwal TO. Opportunistic resource allocation for narrowband Internet of Things: A literature review. 2020 International Conference on Electrical, Communication, and Computer Engineering (ICECCE). IEEE; 2020: 1-6.
[6] Alcaraz Juan J, Losilla Fernando, Gonzalez-Castaño Francisco-Javier. Transmission Control in NB-IoT with Model-Based Reinforcement Learning. IEEE Access. 2023.
[7] Hadjadj-Aoul Yassine, Ait-Chellouche Soraya. Access control in nb-iot networks: A deep reinforcement learning strategy. Information. 2020; 11(11): 541.
[8] Qiang Wang, Zhongli Zhan. Reinforcement learning model, algorithms and its application. 2011 International Conference on Mechatronic Science, Electric Engineering and Computer (MEC). IEEE; 2011: 1143-1146.
[9] Clifton Jesse, Laber Eric. Q-learning: Theory and applications. Annual Review of Statistics and Its Application. 2020; 7: 279-301.
[10] Dankwa Stephen, Zheng Wenfeng. Twin-delayed ddpg: A deep reinforcement learning technique to model the continuous movement of an intelligent robot agent. Proceedings of the 3rd international conference on vision, image, and signal processing. 2019: 1-5.
[11] Harwahyu Ruki, Cheng Ray-Guang, Liu Da-Hao, et al. Fair configuration scheme for random access in NB-IoT with multiple coverage enhancement levels. IEEE Transactions on Mobile Computing. 2019; 20(4): 1408-1419.
[12] Jiang Nan, Deng Yansha, Nallanathan Arumugam, et al. Reinforcement learning for real-time optimization in NB-IoT networks. IEEE Journal on Selected Areas in Communications. 2019; 37(6): 1424-1440.
[13] Alcaraz Juan J, Losilla Fernando, Zanella Andrea, et al. Model-based reinforcement learning with kernels for resource allocation in RAN slices. IEEE Transactions on Wireless Communications. 2022; 22(1): 486-501.
[14] Alcaraz Juan J, Ayala-Romero Jose A, Vales-Alonso Javier, et al. Online reinforcement learning for adaptive interference coordination. Transactions on Emerging Telecommunications Technologies. 2020; 31(10): e4087.
[15] Fujimoto Scott, Hoof Herke, Meger David. Addressing function approximation error in actor-critic methods. International conference on machine learning. PMLR; 2018: 1587-1596.
[16] Jang Beakcheol, Kim Myeonghwi, Harerimana Gaspard, et al. Q-learning algorithms: A comprehensive classification and applications. IEEE Access. 2019; 7: 133653-133667.
[17] Wang Junpeng, Gou Liang, Shen Han-Wei, et al. Dqnviz: A visual analytics approach to understanding deep q-networks. IEEE Transactions on Visualization and Computer Graphics. 2018; 25(1): 288-298.
[18] Wang Yuhui, He Hao, Tan Xiaoyang. Truly proximal policy optimization. Uncertainty in Artificial Intelligence. PMLR; 2020: 113-122.
[19] Wu Jiaolv, Wu QM Jonathan, Chen Shuyue, et al. A-TD3: An Adaptive Asynchronous Twin Delayed Deep Deterministic for Continuous Action Spaces. IEEE Access. 2022; 10: 128077-128089.